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Problem 

^he  programmer  in  a  distributed  processing  environment  must  be 
provided  with  a  set  of  facilities  which  permit  easy  specification  of  the 
distributive  properties  of  his/her  program.  The  word  program  here  is 
used  to  refer  to  either  the  output  of  a  single  compilation  or  the  output 
of  independent  compilations  of  program  modules  which  are  to  be 
communicating  via  an  IPC.  These  distributive  properties  include  the 
specification  of  the  concurrency,  data  flow,  resource  requirements 
(memory,  devices,  etc.),  and  intraprogram  (intermodule)  protocol 
properties  inherent  in  the  execution  of  a  configuration  (system)  of 
cooperating  software  modules.  Given  a  description  of  these  properties, 
an  operating  system  must  be  able  to  distribute  the  user's  program  across 
multiple  machines  in  a  manner  which  is  transparent  to  the  programmer. 
Traditional  approaches  to  providing  these  facilities  include  the 
concurrency  support  in  high-level  languages  and  the  resource  allocation 
and  concurrency  support  in  conventional  operating  systems. 

■Current  Approaches 

Several  high-level  languages  such  as  Concurrent  Pascal  [ 1 ]  and  SP/K 
[2]  have  Incorporated  the  monitor  [3 ,4 J  concept  to  provide  structured 
concurrency.  This  concept  is  excellent  in  a  centralized  system  but 
relies  on  shared  data  (and  therefore  shared  memory),  and  is  therefore 
not  an  appropriate  concept  on  which  to  base  a  distributed  system. 
However,  an  effort  is  underway  at  the  National  Physical  Laboratory  [5l 
to  distribute  a  Concurrent  Pascal  program  aoross  loosely  coupled 
microprocessors.  The  distribution  of  passive  system  oomponents  (such  as 
monitors)  on  disjoint  machines  implies  many  oopy  operations  for 
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parameters  and  also  additional  active  system  components  (processes) 
which  do  not  appear  in  the  program  text. 

A  much  more  appropriate  high-level  language  concept  for  distributed 
programs  is  proposed  by  C.  A.  R.  Hoare  in  reference  [6].  Each  function 
is  a  sequential  process  which  is  connected  to  other  communicating 
sequential  processes  via  input/output.  This  concurrency  support  is 
based  on  data  flow  and  not  shared  data;  therefore,  it  is  not  dependent 
on  shared  memory.  As  a  result,  each  function  is  distributable. 
However,  it  seems  that  buffering  of  data  between  processes  is  necessary 
to  improve  performance  in  distributed  systems  with  slow  speed 
connections.  Since  the  compiler  for  such  a  language  presumably  can 
generate  the  resource  requirements  for  the  program,  since  processes  are 
identified  by  name,  and  since  the  protocol  between  processes  is  fixed, 
enough  knowledge  is  available  to  distribute  a  set  of  processes  which  are 
compiled  together. 

A  second  area  of  programmer  concern  for  distribution  occurs  because 
concurrent  program  functions  (modules)  may  be  separately  generated 
(compiled).  These  may  well  be  existing  programs  or  just  separate 
functions  based  on  programming  style.  The  interconnection  of  these 
modules  into  a  program  is  dynamic  and  therefore  requires  operating 
system  support.  In  early  conventional  operating  systems,  the  support 
for  combining  these  functions  into  a  configuration  of  communicating 
concurrent  software  functions  is  specified  at  three  levels.  First, 
overlap  of  CPU  and  I/O  is  made  available  for  standard  I/O  file 
functions.  Second,  added  concurrency  is  achieved  only  with  unstructured 
(low-level)  facilities  for  process  creation,  naming,  and  communication. 
Third,  complex  Job  control  languages  are  provided  to  achieve  allocation 
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of  resources  to  run  these  functions*  In  a  distributed  system,  these  JCL 
steps  must  be  synchronized  across  machines*  Complex  resource  control  in 
a  distributed  system  should  certainly  not  be  the  programmer's 
responsibility*  This  is  alleviated  by  viewing  distributed  operating 
systems  and  their  executable  programs  as  cooperating  processes.  A 
highly  successful  system  is  the  Distributed  Computing  System  of  Farber 
[7].  In  this  system,  the  structure  and  distribution  of  the  set  of 
processes  is  transparent  to  the  user;  and  a  high  level  of  concurrency  is 
achieved  without  use  of  low-level  process  control  primitives. 

Process  naming  of  cooperating  processes  is  still  burdensome  to  the 
programmer.  The  same  problem  also  occurs  in  current  "mailbox"  schemes 
as  epitomized  by  the  VAX  11/780  system  [8].  The  naming  or  numbering  of 
mailboxes  must  be  known  to  the  programmer  or  a  creating  process.  This 
is  commonly  referred  to  as  the  IPC-setup  problem,  coined  by  Elliot 
Organick  in  reference  [93.  The  designers  of  UNIX  [10,11]  sought  to 
alleviate  this  problem.  They  invented  the  "pipe."  In  UNIX  a  user 
program,  running  in  its  own  process,  may  take  the  place  of  a  file  in  a 
manner  which  is  transparent  to  the  original  program.  Each  program  may 
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have  its  standard  input  and  output  files  replaced  by  programs,  thus 
building  via  the  UNIX  shell  arbitrarily  long  linear  chains  (a  pipeline) 
of  programs.  UNIX  automatically  transfers  the  data  between  processes 
and  synchronizes  the  process  as  it  Intercepts  the  standard  input  and 
output  file  operations. 

UNIX  "pipes"  eliminate  the  need  for  process  naming  and  treat 
oonourrenoy,  resouroe  allocation,  and  lntarprooeas  protocol  as  a  data 
flow  problem.  Interproeesa  protooola  are  treated  simply  as  simplex  data 
streams.  The  job  control  language  provided  by  the  UNIX  shell  becomes  a 
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pseudo  data  flow  language  and  resource  allocation  is  transparent  to  the 
programmer.  However,  there  are  a  considerable  number  of  programmer 
protocols  which  are  not  served  by  "pipes."  As  acknowledged  in  reference 

[11] ,  "pipes"  cannot  be  used  to  construct  multi-server  subsystems. 

UNIX  will  support  general  interprocess  communication  protocols,  but 
these  are  not  generated  by  the  shell.  These  can  be  programmed  as  a  set 
of  child  processes  whose  "pipes"  have  been  setup  by  a  parent  process. 

A  Jteasarch  drectian 

If  we  are  to  be  successful  in  distributing  programs  across  highly 
distributed  systems,  we  must  provide  the  programmer  of  dynamically 
interconnected  cooperating  processes  a  job  control  language  (software 
configuration  control)  as  easy  to  use  as  Hoarefs  communicating 
sequential  processes.  It  seems  that  the  most  promising  direction  is  to 
extend  the  concept  of  the  UNIX  shell  to  automatically  generate  the  more 
complex  protocols  available  to  the  parent  processes  previously 
described.  It  must  then  also  also  be  extended  to  generate 
(representations  of)  distributable  configurations  of  communicating 
processes. 

Work  in  this  area  is  underway  at  Kansas  State  University.  The 
project  involves  development  of  a  Network  Adaptable  Executive  (NADEX) 

[12] .  The  attempt  is  to  permit  the  user  to  specify  data  flow  at  the 
command  level  and  have  the  command  Interpreter  generate  a  distributable 
software  configuration  of  nodes  connected  by  full  duplex  data  transfer 
stream  connections  (DTS  connections)  to  form  an  undirected  graph*  In 
general,  a  node  may  be  thought  of  as  a  process*  Each  of  the  connections 
consists  of  two  Independent  bi-direotional  data  transfer  streams*  One 
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of  these  streams  uses  small  parameters  while  the  other  uses  a 
standard-sized  data  buffer.  The  data  buffers  carry  along  with  them  size 
and  status  indicators,  whereas  the  parameter  buffers  contain  only  a 
small  amount  of  user-supplied  data. 

A  user  program  running  in  a  node  performs  serial  buffered  READ  and 
WRITE  operations  in  its  various  connections.  The  connections  are 
numbered,  and  the  program  attaches  particular  meanings  and  implements 
particular  protocols  for  each  of  its  connections.  A  connection  can 
connect  a  node  either  to  a  user  program  or  to  a  system  process  used  to 
access  a  file  or  an  I/O  device.  The  program  cannot  tell  the  difference 
between  these  modes  of  operation.  This  clearly  provides  all  of  the 
power  of  the  UNIX  pipelines  while  removing  the  linearity  constraint  on 
the  structure  of  the  connection  graph.  Also,  the  connections  are 
bi-directional  so  that,  for  example,  a  write-request/read-response 
protocol  to  access  a  random  file  can  be  implemented. 

For  these  serial  buffered  READ  and  WRITE  operations,  a  priori 
protocol  knowledge  can  be  specified  to  the  underlying  data  flow 
implementation  (buffer  oontrol)  to  enable  it  to  maintain  a  check  for 
validity  of  user  protocol  (in  terms  of  data  flow)  during  execution. 
This  protocol  checking  is  critical  in  "un-debugged"  (user-written) 
nodes.  Examples  of  such  protocol  violations  occur  many  times  in  the 
facilities  of  SOLO  t 13 ] *  Deadlock  detection  is  also  performed  based  on 
data  flow  In  a  configuration  which  is  distributed  across  machines 
connected  by  a  network  IPC.  Multi server  subsystems,  such  as  a  data  base 
management  system,  are  lmplementable  as  a  configuration  with 
mul tl-oonnect ion  READ  (multiple  condition  WAITs)  and  conditional  WRITE 
operations  provided  on  data  transfer  streams.  Intar configuration 
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connections  are  also  provided.  Finally,  the  command  interpreter  and  the 
node  interface  (PREFIX)  provide  all  the  mapping  of  logical  data  streams 
(ports)  onto  implementation  data  streams. 
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